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Abstract 

The efficient operation of heating systems relies heavily on accurate temperature monitoring. However, such 
systems are vulnerable to data corruption, which can lead to erroneous temperature readings and potentially 
hazardous conditions. We propose a novel approach to address this challenge by employing machine learning 
techniques for spam detection and correction in heater temperature data for the detection phase. 

We explore various machine learning algorithms including spam detection and classification models, to 
identify spam temperature readings. These models are trained on the preprocessed dataset and evaluated 
using appropriate metrics to assess their performance. 

Keywords: Spam Detection, Temperature Control, Heater Systems, Python, Machine Learning, Linear 


Regression. 


1. Introduction 

In modern heating systems, maintaining precise 
temperature control is essential for efficiency and 
comfort. In this research paper, we propose a novel 
approach to smart spam detection and correction for 
temperature monitoring in heaters using python 
machine learning techniques. Machine learning 
offers a powerful toolkit for identifying patterns and 
spam in large datasets, making it well-suited for this 
task. Traditional methods of spam detection often 
rely on predefined thresholds or simple statistical 
measures, which can be insufficient in handling 
complex and dynamic environments typical of 
heating systems. Machine learning algorithms, on the 
other hand, can learn from historical data, adapt to 
new patterns, and provide more accurate and reliable 
detection and correction mechanisms. In the context 
of heating systems, temperature spam can have 
severe consequences, such as increased energy 
consumption, equipment damage, and _ reduced 
comfort levels. Moreover, incorrect temperature 
readings can also lead to inaccurate diagnoses and 
inefficient maintenance procedures. Therefore, it is 
essential to develop effective methods to detect and 
correct temperature spam in real-time. Existing 
methods for detecting spam data in smart heaters 
typically rely manual inspection or rulebased 
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filtering, which can be time-consuming and prone to 
errors. These methods may not be effective in 
detecting complex patterns of spam data. The 
significance of this research in its potential to 
revolutionize temperature monitoring systems in 
various domains, including residental and industrial 
heating applications. By mitigating the impact of 
spam data, our proposed solution not only improves 
temperature control but also reduces energy 
consumption and maintenance costs. The 
contributions of this paper: 

1. We propose a machine _learning-based 
approach for detecting spam data in smart 
heaters, which can improve the accuracy and 
reliability of temperature readings. 

2. We develop a correction algorithm that can 
adjust the temperature readings based on the 
detected spam data, ensuring that the heating 
system operates efficiently and safely. 

2. Literature Survey 

2.1. Spam 
The heater adjusts its power output based on the 
temperature of the room. When the room temperature 
is lower than the set point, the heater will increase 
power output to heat the room faster. As the room 
temperature approaches the set point, the heater will 
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gradually decrease its power output to maintain the 
set temperature. 

2.2. The Benefits of Spam Include 

1. Faster Heating: By increasing power output 
when the room is cold, spam helps to heat the 
room quickly. 

2. Energy Efficiency: By adjusting power 
output based on temperature, spam can help 
reduce energy consumption and minimize 
standby losses. 

3. Improved Temperature Control: Spam 
helps to maintain a consistent temperature by 
adjusting power output to match changing 
room conditions. 

2.3. Spam Detection and Correction 

The involves looking into existing methods, 
algorithms and techniques used for spam detection 
and correction in various domains such as email, 
social media, heater, or other communication 
channel. 

2.4. Temperature Control in Heaters 
Exploring literature related to temperature control 
systems, especially in heaters or similar devices, will 
provide insights into different control strategies, 
feedback mechanisms and algorithms used to 
maintain or regulate temperature effectively and 
efficiently. 

2.5. Python Machine Learning 
Reviewing literature on machine learning techniques 
and algorithms implemented using python will 
provide a foundation for understanding the tools and 
methodologies available for building machine 
learning tools. This includes exploring topics such as 
classification, regression, clustering and NLP, among 
others. 

2.6. Integration of Machine Learning in 

Temperature Control 
Investigating studies that have applied machine 
learning techniques to temperature control systems or 
similar domains can offer valuable insights into the 
challenges, opportunities and best practices for 
integrating machine learning into such systems. 
3. Methodology 
The spam detection and correction for temperature in 
heater using machine learning algorithm 

3.1. Software 
For implementing this project using python machine 
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learning, I have used jupyter notebook in python 
software and various libraries. Python is the primary 
programming language used for machine learning 
project due to its simplicity, readability, and the 
availability of numerous libraries for data 
manipulation, visualization, and machine learning 
model development. 


Collect Temperature in heater 
Dataset 


Data Pre-Processing 


Data Cleanung 


Split And Train Dataset 


Feature Selection 


Experimental Data 


Apply Machine Learning Algorithms 


Spam Detection and Correction For Temperature In 
Heater 


Figure 1 Flowchart of Spam Detection in 
Temperature in Heater 


We start with spam detection and correction for 
temperature in heater data This is shown in figure. 1 
according to the above flow. After the data has been 
collected, we have train our machine learning model 
using the data they are not directly applicable to our 
project. To accomplish this our data needs to be 
preprocesed. Following that, I have split our data into 
training and test data, which will be used in training 
and evaluating our model. Once I have done that, I 
have feed data into our Linear Regression model. [1] 
3.2. Importing Libraries 
I have started by first imported libraries I have been 
used shown in figure.2. Using matplotlib you can plot 
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graphs, histogram and bar plot. Pandas for data 
manipulation and analysis, Numpy is to do the 
mathematical and scientific operation. [2] Seaborn is 
used for making statistical graphics more attractive 
and informative. Statsmodels is used for statistical 
modeling and hypothesis testing. 


import pandas as pd 

import numpy as np 

import matplotlib.pyplot as plt 
import seaborn as sns 

import statsmodels.api as sm 


Figure 2 Packages 


4. Data Collection 
Collect a dataset of temperature readings from the 
heater, along with corresponding labels indicating 
whether the reading is normal or spam. 

4.1. Dataset 
The dataset used in this research paper is available on 
Kaggle this dataset shown in figure 3. 


inport pandas as pd 
f © pd.read_csv("Temperature heater dataset. csv") 


Gf.head(5) 
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Figure 3 Dataset 


5. Data Preprocessing 
Clean and preprocess the data for analysis. This 
might involve handling missing values, normalizing 
the data and labeling the data if necessary. 

5.1. Data Cleaning 

5.1.1.Handling Missing Values 

Handling missing values this is shown in figure 4 for 
demonstration and filled them using the rolling mean 
of temperature column. 
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check for missing values in dataset 


df.isnull().sum() 


Time (s) 

CO (ppm) 
Humidity (%r.h.) 
Temperature (C) 
Flow rate (mL/min) 
Heater voltage (V) 
Ri (MOhm) 

R2 (MOhm) 

R3_ (MOhm) 

R4 (MOhm) 

RS (MOhm) 

R6 (MOhm) 

R7 (MOhm) 

R8 (MOhm) 

RO (MOhm) 

R18 (MOhm) 

R11 (MOhm) 

R12 (MOhm) 

R13 (MOhm) 

R14 (MOhm) 
dtype: int64 


Seeeeeeoe2202220209090 2020 


Figure 4 Handling Missing Values 


5.1.2.View Summary of Dataset 
A typical dataset for temperature monitoring in 
heaters might include the following columns belown 
shown in figure 5. 


# view summary of dataset 


df .info() 


<class ‘pandas.core.frame.DataFrame"> 
RangeIndex: 295719 entries, @ to 295718 

Data columns (total 2@ columns): 

# = Colusn Non-Null Count Dtype 
295719 non-null float64 
295719 non-null float64 


@ Time (s) 

1 CO (ppm) 

2 Humidity (%r.h.) 295719 non-null float64 
3 Temperature (C) 295719 non-null float64 
4 Flow rate (mL/min) 295719 non-null float6é 
5 Heater voltage (V) 295719 non-null float64 
6 R1 (MOhm) 
7 R2 (Ohm) 
8 


295719 non-null float64 
295719 non-null float64 
R3 (MOhm) 

9 R4 (MOhm) 

10 RS (MOhm) 

11 R6 (MOhm) 

12 R7 (MOhm) 

13 RS (MOhm) 

14 RO (MOhm) 

1S R1@ (MOhm) 

16 R11 (MOhm) 

17 R12 (MOhm) 

18 R13 (MOhm) 

19 R14 (MOhm) 
dtypes: float64(20) 
memory usage: 45.1 MB 


Figure 5 Summary of Dataset 


295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null float64 
295719 non-null floaté4 


5.1.3.Columns 
The pandas dataframe below shown in figure.6 lists 
the names of the columns in the dataframe. 


[Ll]: df. colums 


[Ll]: Index( {Tine (5)', "CO (ppa)', "Hudity (We.h.)', ‘Temperature (C)', 
"Slow rate (el/ain)’, "Heater voltage (V)', "Rl (Mhe)', “#2 (Miha)’, 
"(Whe)", "4 (Mha)', “8S (Nhe), "5 (Atha), “AT (he) 
"RB (Wm), “#9 (MOha)’, “RUG (WOhe)’, “PLL (AOhe)", “R12 (HObe)’, 
‘RS (Wh), ‘LE (Whe) 
dtype=" objet’) 


Figure 6 List of Columns 
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5.2. Separate Feature and Target Variable 
5.2.1.Feature Variable 
The feature variable this shown in figure.7 or 
independent variable is the input data used to make 
classifications. In this case, the feature variable 
would be the temperature readings from the heater. 


+a 

meee 

CO Humidity Temperatare low rate m ae co me ss Rs a = = RY 
em) (Xen) {2 omtminy “2 Gace) MOR) OMORe) (Ohm) (MCh) CO) Ohm) Ohm) (MCR) CMORM) (Mohs 


Figure 7 Separate Feature Variable 


5.2.2.Target Variable 
The target variable this shown in figure.8 or 
dependent variable is the variable we want to classify 
based on the feature variables. In this case, the target 
variable could be whether each temperature reading 
is normal or spam. 


@.000 
@.309 
@.618 
@.926 
1.234 


PWnNeF © 


90908.545 
98988 ..853 
90989 .162 
295717 90989 .469 
295718 90989.778 
Name: Time (s), Length: 295719, dtype: float64 


295714 
295715 
295716 


Figure 8 Separate Target Variable 


6. Splitting Training and Testing Data 
The next part is the most important since we used one 
set of data to test our model and another set to 
evaluate it. In othere words, part of the X will be our 
training data, and the other part will be our test data. 
The same applies to Y. [3] 

6.1. Training Dataset 
This training dataset shown in figure.9 will be used 
to train the machine learning model to detect and 
correct spam temperature readings. It should include 
a sufficient number of normal and spam temperature 
readings. A common split is around 70-80% of the 
data for training. 
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from skleare.nodel_salection dapert train_t 
xtrain,x_test, y_train, y test = tr 


CO Humidity Temperature Floweate et Rt R2 cs) - s Lo a Re mR 
(pm) (eh) (©) (rb/min) “oO (Che) OMOhm) OMOhm) (MOhm) (MCh) (Mihm) OMOhe) (Ohm) (MOhm) (MOhm) (MCh! 


Figure 9 Training Dataset 


6.2. Testing Dataset 

This testing dataset shown in figure.10 will be used 
to evaluate the performance of the trained model. It 
should also include a mix of normal and spam 
temperature readings, but it should be distinct from 
the training dataset to ensure an unbiased evaluation. 
The remaining 20-30% of the data is typically used 
for testing. 


tet 


CO Memidty Temperature Row rate Lu 2 Ly = Ly v7 cy Ls] Lil] Lill 
(bpm) (eh) 10) (rnt/min) . (Oba) (éObe) —MOhen) (MOM) (MCh) (Ot) (MCI) (MO) (AliCihen) (MCh) (MiCthen 


wu MM Hm & * Oh 60TH he aun Pp oe Hay 6 u 171) 60M SHER 


mane | 2 7M U9 O20) 2M Lae HT MS AG7AN MANOS) 452667 254K MONT BMD 7D) 


Figure 10 Testing Dataset 


6.3. Model Training 
Using linear regression to train the model based on 
the preprocessed data to learn patterns and 
relationships between a dependent variable and one 


or more independent variables this shown in figure 
11. 


from sklearn. linear_model import LinearRegression 
nodel = LinearRegression() 
nodel. fit(x_train_const, y_train) 


+ LinearRegression 


Linearfegressica() 


Figure 11 Model Training 
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6.4. Compare Train and Test Set Accuracy 
It’s essential to evaluate the performance of your 
model on both the training data and the testing data. 
This is because the training data is used to train the 
model, while the testing data is used to evaluate its 
performance on unseen data. 

Accuracy Formula: 
Accuracy is a measure of how well your model 
predicts the correct output this shown in figure.12. 
Accuracy = (TP+TN)/(TP+TN+FP+FN) 
Where: 
TP = True Positives (correct predictions) 
TN = True Negatives (correct predictions) 
FP = False Positives (incorrect predictions) 
FN = False Negatives (incorrect predictions) 


y_pred_train = nodel.predict(x_train) 


y_pred_train 


anray( (34958, 9814 


M6, 


Figure 12 Train and Test Set Accuracy 


6.5. Training and Test Set Accuracy Score 
The accuracy score is a measure of how well your 
machine learning model performs on the test set. This 
is shown in figure.13. It’s the proportion of correctly 
classified instances (correct/incorrect temperature 
readings) out of the total number of instances in the 
test set. 

Formula: 
Accuracy = (TP+ TN)/(Total Samples) 
Where: 

1. True Positives (TP) = Correctly classified 

correct temperature readings. 

2. True Negatives (TN) = Correctly classified 

incorrect temperature readings. 

3. True Samples = Total number of instances in 

the test set. 


+ cod 
1 


print('Training set score: {:.4F}'.format(model. score(x train, y train))) 
print("Test set score: {:.4F}" foneat(aodel score(x_test, y_test))) 
Training set score: 0.4411 


Test set score: 0.4382 


Figure 13 Accuracy Score 
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6.6. Confusion Matrix 
A confusion matrix is a matrix the summarizes the 
performance of a machine learning model on a set of 
test data. It is means of displaying the number of 
accurate and inaccurate instances based on _ the 
model’s predictions. It is often used to measure the 
performance of classification models, which aim to 
predict a categorical label for each input instance. 
The specific table layout that allows visualization of 
the performance of an algorithm, typically a 
supervised learning model. It is used for classification 
problems where the output can be two or more 
classes. The matrix itself is a square table with 
dimensions equal to the number of classes in the 
classification problem. this confusion matrix shown 
in figure.14. Confusion matrix is a very good way to 
understand results like true positive, false positive, 
true negative and so on [4]. 


Not Spam 
Spam 


Not Spam Spam 
Predicted label 


True label 


Figure 14 Confusion Matrix 


6.7. Precision & Recall 
Precision it is ratio of true positive predictions to the 
total predicted positives. 

Precision = TP/(TP+FP) 
Recall It is the ration of true positive predictions to 
the total actual positives. 
Recall = TP/(TP+EN) 

TP = True Positive 
FP = False Positive 
TN = True Negative 
FN = False Negative 
Precision = 99% 
Recall = 91.3% 
7. Machine Learning Algorithm 
Ihave used linear regression algorithm in this project. 

7.1. Linear Regression 
Linear regression is a type of supervised machine 
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learning algorithm that computes the linear 
relationship between the dependent variable and one 
or more independent features by fitting a linear 
equation to observed data [5]. When there is only one 
independent variable, it is known as Simple Linear 
Regression and when there are multiple independent 
variables, it is known as Multiple Linear Regression. 
A supervised machine learning algorithm that learns 
from the labelled dataset and maps the data points to 
the most optimized linear functions. Which can be 
used for predictions on new dataset [6]. 
Classification: It predicts the class of the dataset 
based on the independent input variable, class is the 
categorical or discrete values [7]. 
Regression: It predicts the continuous output 
variables based on the independent input variable, 
like the predictions of spam mail [8-10]. 

7.2. Types of Linear Regression 

7.2.1. Simple Linear Regression 
This is the simplest form of linear regression, and it 
involves only one independent variable and one 
dependent variable. The equation for simple linear 
regression is: 
Y=B0+B1X 

Where: 

e Y is the dependent variable 

e X is the independent variable 

e 0 is the intercept 

e 1 is the slope 

7.2.2. Multiple Linear Regression 

This involves more than one independent variable 
and one dependent variable. The equation for 
multiple linear regression is [11]: 


Y=B0+B1X+B2X+......... pnX 
Where: 
e Y is the dependent variable 
e Xl, X2, ...., X_p are the independent 
variables 


e 0 is the intercept 

e £1, B2, ...., Bn are the slopes 
8. Calculate Spam in Temperature 
Calculate the spam in temperature based on the 
provided variables we'll denote the spam in 
temperature as T which is the difference between the 
maximum and minimum temperature. 
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8.1. Identify Variables 
CO: carbon monoxide concentration 
H: Humidity 
T: Temperature 
F: Flowrate 
V: Heater Voltage 
R: Resistance 
Ri: Values of resistors 
BO: Intercept 
B1, B2..., BS5+nB1, B2......, B5+n: Coefficients 
for each independent variable 
Ee: Error term 
8.2. General Linear Model for Temperature 
Assuming a linear relationship (which can be 
modified if the relationship is known to be non- 
linear): 
T= B0 + B1-CO + B2:H + B3-F + B4-V + B5-R + YI =In 
B5+1-Rit+e 
8.3. Spam Calculation 
The spam of temperature T is given by: 


T= (T)] max - [ T] _min 


8.3.1. Determining [(T)] _max and T_min 


To find [(T)} _max and T_min calculates the 


temperature using the maximum and minimum 
values of each variable. 

Then, 

T_max = £0 + B1-CO_max+ B2-H_max+B3-F_max + 
B4-V_max +B5-R_max+)i=1nB5+i- (Ri) _max +e 
T_min = B0 + B1-CO_min+ B2:H_min+ §3-F_min + 
B4-V_min +B5-R_min+Yi=1nB5+i- (Ri) _min+e 

9. Result and Discussion 

In this session, all the results are presented in graph, 
tables. All the results are discussed in detail. This 
code that the dataset is stored in csv files named 
dataset.csv. This CSV file shown in figure.15. The 
code loads the data, defines the features and target 
variable, create a linear regression model, fits the 
model to each dataset, predicts the temperature for 
each dataset, calculate the error term and spam or not 
spam classification for each dataset, and saves the 
results to a CSV file. Finally, it evaluates the 
accuracy of the model using the accuracy_score 
function from scikit_learn [12]. 
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df.to_csv(‘Temperature_heater_dataset_output.csv", index=False) 
Fiowrate Mente RI mR OR ORT RB RS RID ORIT.—RIZ-—RIB_—RTA. Spa Not 


(mL'min) aa (MOten) (MOhm) (MOhm) (Ohm) ~ (MOhm) (MOhm) (MOhm) (MOhm) (MOtm) (MOhm) (MOhm) (MOhm) (Ohm) Spam 


Figure 15 CSV File Spam Not Spam 


Conclusion 

The conclusion is that algorithm can be used to 
classify whether a given temperature reading is spam 
or not spam based on a linear regression model that 
takes into account several factors. The algorithm can 
be trained on dataset and then used to predict whether 
new temperature readings are spam or not spam. This 
script automates the process of calculating the 
temperature, classifying the data, and saving the 
results to new csv files. You can adjust the 
coefficients and threshold as needed for your specific 
application. 
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